AITopics

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.76)

Neural Information Processing SystemsMay-27-2025, 17:48:25 GMT

Provable Benefits of Complex Parameterizations for Structured State Space Models

Structured state space models (SSMs), the core engine behind prominent neural networks such as S4 and Mamba, are linear dynamical systems adhering to a specified structure, most notably diagonal. In contrast to typical neural network modules, whose parameterizations are real, SSMs often use complex parameterizations. Theoretically explaining the benefits of complex parameterizations for SSMs is an open problem. The current paper takes a step towards its resolution, by establishing formal gaps between real and complex diagonal SSMs. Firstly, we prove that while a moderate dimension suffices in order for a complex SSM to express all mappings of a real SSM, a much higher dimension is needed for a real SSM to express mappings of a complex SSM.

complex parameterization, ssm, structured state space model, (5 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.64)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.53)

Neural Information Processing SystemsMay-27-2025, 10:32:18 GMT

On Feature Learning in Structured State Space Models

This paper studies the scaling behavior of state-space models (SSMs) and their structured variants, such as Mamba, that have recently arisen in popularity as alternatives to transformer-based neural network architectures. Specifically, we focus on the capability of SSMs to learn features as their network width approaches infinity. Our findings reveal that established scaling rules, such as the Maximal Update Parameterization, fail to support feature learning as these models cannot be represented in the form of Tensor Programs. Additionally, we demonstrate that spectral scaling conditions, shown to be effective for feature learning in a host of other architectures, do not hold the same implications for SSMs. Through a detailed signal propagation analysis in SSMs, both forward and backward, we identify the appropriate scaling necessary for non-trivial feature evolution in the infinite-width limit. Our proposed scaling shows behavior akin to the Maximal Update Parameterization, such as improved stability, better generalization, and transferability of optimal hyper-parameters from small to large scale SSMs.

feature learning, maximal update parameterization, structured state space model, (2 more...)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Neural Information Processing SystemsMay-27-2025, 05:32:47 GMT

Structured State Space Models for In-Context Reinforcement Learning

Structured state space sequence (S4) models have recently achieved state-of-the-art performance on long-range sequence modeling tasks. These models also have fast inference speeds and parallelisable training, making them potentially useful in many reinforcement learning settings. We propose a modification to a variant of S4 that enables us to initialise and reset the hidden state in parallel, allowing us to tackle reinforcement learning tasks. We show that our modified architecture runs asymptotically faster than Transformers in sequence length and performs better than RNN's on a simple memory-based task. We evaluate our modified architecture on a set of partially-observable environments and find that, in practice, our model outperforms RNN's while also running over five times faster.

artificial intelligence, in-context reinforcement learning, machine learning, (2 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.92)

Neural Information Processing SystemsJan-19-2025, 15:56:54 GMT

Structured State Space Models for In-Context Reinforcement Learning

Structured state space sequence (S4) models have recently achieved state-of-the-art performance on long-range sequence modeling tasks. These models also have fast inference speeds and parallelisable training, making them potentially useful in many reinforcement learning settings. We propose a modification to a variant of S4 that enables us to initialise and reset the hidden state in parallel, allowing us to tackle reinforcement learning tasks. We show that our modified architecture runs asymptotically faster than Transformers in sequence length and performs better than RNN's on a simple memory-based task. We evaluate our modified architecture on a set of partially-observable environments and find that, in practice, our model outperforms RNN's while also running over five times faster.

in-context reinforcement learning, sequence, structured state space model

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.92)

arXiv.org Artificial IntelligenceJul-31-2024

Enhanced Structured State Space Models via Grouped FIR Filtering and Attention Sink Mechanisms

Meng, Tian, Tao, Yang, Yin, Wuliang

Structured State Space Models (SSMs) have emerged as compelling alternatives to Transformer architectures, offering linear-time complexity and superior performance in various sequence modeling tasks. Despite their advantages, SSMs like the original Mamba-2 face training difficulties due to the sensitivities introduced by the extended series of recurrent matrix multiplications. In this paper, we propose an advanced architecture that mitigates these challenges by decomposing A-multiplications into multiple groups and optimizing positional encoding through Grouped Finite Impulse Response (FIR) filtering. This new structure, denoted as Grouped FIR-enhanced SSM (GFSSM), employs semiseparable matrices for efficient computation. Furthermore, inspired by the "attention sink" phenomenon identified in streaming language models, we incorporate a similar mechanism to enhance the stability and performance of our model over extended sequences. Our approach further bridges the gap between SSMs and Transformer architectures, offering a viable path forward for scalable and high-performing sequence modeling.

multiplication, sequence, structured state space model, (12 more...)

2408.00244

Country: North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.55)

Fillioux, Leo, Boyd, Joseph, Vakalopoulou, Maria, Cournède, Paul-Henry, Christodoulidis, Stergios

Structured State Space Models for Multiple Instance Learning in Digital Pathology

arXiv.org Artificial IntelligenceJun-27-2023

Multiple instance learning is an ideal mode of analysis for histopathology data, where vast whole slide images are typically annotated with a single global label. In such cases, a whole slide image is modelled as a collection of tissue patches to be aggregated and classified. Common models for performing this classification include recurrent neural networks and transformers. Although powerful compression algorithms, such as deep pre-trained neural networks, are used to reduce the dimensionality of each patch, the sequences arising from whole slide images remain excessively long, routinely containing tens of thousands of patches. Structured state space models are an emerging alternative for sequence modelling, specifically designed for the efficient modelling of long sequences. These models invoke an optimal projection of an input sequence into memory units that compress the entire sequence. In this paper, we propose the use of state space models as a multiple instance learner to a variety of problems in digital pathology. Across experiments in metastasis detection, cancer subtyping, mutation classification, and multitask learning, we demonstrate the competitiveness of this new class of models with existing state of the art approaches. Our code is available at https://github.com/MICS-Lab/s4

artificial intelligence, machine learning, sequence, (17 more...)

2306.15789

Country:

North America > United States (0.04)
Europe > France > Île-de-France (0.04)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Diagnostic Medicine (0.87)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Artificial IntelligenceMay-16-2023

Counterfactual Outcome Prediction using Structured State Space Model

Purohit, Vishal

Counterfactual outcome prediction in longitudinal data has recently gained attention due to its potential applications in healthcare and social sciences. In this paper, we explore the use of the state space model, a popular sequence model, for this task. Specifically, we compare the performance of two models: Treatment Effect Neural Controlled Differential Equation (TE-CDE) and structured state space model (S4Model). While TE-CDE uses controlled differential equations to address time-dependent confounding, it suffers from optimization issues and slow training. In contrast, S4Model is more efficient at modeling long-range dependencies and easier to train. We evaluate the models on a simulated lung tumor growth dataset and find that S4Model outperforms TE-CDE with 1.63x reduction in per epoch training time and 10x better normalized mean squared error. Additionally, S4Model is more stable during training and less sensitive to weight initialization than TE-CDE. Our results suggest that the state space model may be a promising approach for counterfactual outcome prediction in longitudinal data, with S4Model offering a more efficient and effective alternative to TE-CDE.

artificial intelligence, machine learning, te-cde, (12 more...)

2305.09207

Country: North America > United States > Hawaii > Honolulu County > Honolulu (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.94)

Industry: Health & Medicine > Therapeutic Area > Oncology (0.88)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Mehari, Temesgen, Strodthoff, Nils

Advancing the State-of-the-Art for ECG Analysis through Structured State Space Models

arXiv.org Artificial IntelligenceNov-14-2022

The field of deep-learning-based ECG analysis has been largely dominated by convolutional architectures. This work explores the prospects of applying the recently introduced structured state space models (SSMs) as a particularly promising approach due to its ability to capture long-term dependencies in time series. We demonstrate that this approach leads to significant improvements over the current state-of-the-art for ECG classification, which we trace back to individual pathologies. Furthermore, the model's ability to capture long-term dependencies allows to shed light on long-standing questions in the literature such as the optimal sampling rate or window size to train classification models. Interestingly, we find no evidence for using data sampled at 500Hz as opposed to 100Hz and no advantages from extending the model's input size beyond 3s. Based on this very promising first assessment, SSMs could develop into a new modeling paradigm for ECG analysis.

artificial intelligence, deep learning, machine learning, (17 more...)

2211.07579

Country:

Europe > Germany > Berlin (0.04)
North America > United States (0.04)
Europe > Germany > Lower Saxony > Oldenburg (0.04)

Genre: Research Report > Promising Solution (0.66)

Industry:

Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)
Health & Medicine > Diagnostic Medicine (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)